A Method for Automating Text Markup

نویسندگان

  • Rashmi K. Iyengar
  • R. M. Malyankar
چکیده

Markup languages based on XML are increasingly popular, and languages for other formats such as RDF are under active development. One of the problems involved in converting legacy documents to use XML or other markup formats is the insertion of tags into the document and the consequent rearrangement of text required when markup is added to an existing, un-marked-up document. This paper describes a method for automating part of the process of marking up such legacy documents. The approach is designed for semi-structured text documents, for example, technical documentation and narrative descriptions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automating XML markup of text documents

We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Self-Organizing Map (SOM) algorithm to arrange XML marked-up documents on a twodimensional map so that similar documents appear closer to each other. It then employs an inductive learning algorithm C5 to automatically ex...

متن کامل

Automating XML Markup using Machine Learning Techniques

In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...

متن کامل

From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers

We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during our ongoing work on automating the markup of scanned copies of the biodiversity literature. Such automation is required if historic literature is to be used to inform contemporary issues in biodiversity research. We consider a...

متن کامل

Model annotation for synthetic biology: automating model to nucleotide sequence conversion

MOTIVATION The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dyn...

متن کامل

مدل سازی شوک های مارک آپ با استفاده از مدل DSGE (مورد ایران)

This paper investigates the effects of markup shocks of domestic and export goods prices on macroeconomic variables by using a Dynamic Stochastic General Equilibrium (DSGE) model for Iran, in order to examine the effect of the growth of market power and monopoly in domestic and exporting markets from a macroeconomic viewpoint. To this end, the optimal pricing process of domestic, importing and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002